AITopics | proxy label

Collaborating Authors

proxy label

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mind the Gaps: Auditing and Reducing Group Inequity in Large-Scale Mobility Prediction

Kumar, Ashwin, Zhang, Hanyu, Schweidel, David A., Yeoh, William

arXiv.org Artificial IntelligenceNov-3-2025

Next location prediction underpins a growing number of mobility, retail, and public-health applications, yet its societal impacts remain largely unexplored. In this paper, we audit state-of-the-art mobility prediction models trained on a large-scale dataset, highlighting hidden disparities based on user demographics. Drawing from aggregate census data, we compute the difference in predictive performance on racial and ethnic user groups and show a systematic disparity resulting from the underlying dataset, resulting in large differences in accuracy based on location and user groups. To address this, we propose Fairness-Guided Incremental Sampling (FGIS), a group-aware sampling strategy designed for incremental data collection settings. Because individual-level demographic labels are unavailable, we introduce Size-Aware K-Means (SAKM), a clustering method that partitions users in latent mobility space while enforcing census-derived group proportions. This yields proxy racial labels for the four largest groups in the state: Asian, Black, Hispanic, and White. Built on these labels, our sampling algorithm prioritizes users based on expected performance gains and current group representation. This method incrementally constructs training datasets that reduce demographic performance gaps while preserving overall accuracy. Our method reduces total disparity between groups by up to 40\% with minimal accuracy trade-offs, as evaluated on a state-of-art MetaPath2Vec model and a transformer-encoder model. Improvements are most significant in early sampling stages, highlighting the potential for fairness-aware strategies to deliver meaningful gains even in low-resource settings. Our findings expose structural inequities in mobility prediction pipelines and demonstrate how lightweight, data-centric interventions can improve fairness with little added complexity, especially for low-data applications.

accuracy, artificial intelligence, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.2694

Country: North America > United States > Texas (0.16)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Public Health (0.48)
Transportation > Infrastructure & Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.88)

Add feedback

Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech

Corrêa, Pedro, Lima, João, Moreno, Victor, Ueda, Lucas, Costa, Paula Dornhofer Paro

arXiv.org Artificial IntelligenceOct-31-2025

ABSTRACT Advancements in spoken language processing have driven the development of spoken language models (SLMs), designed to achieve universal audio understanding by jointly learning text and audio representations for a wide range of tasks. Although promising results have been achieved, there is growing discussion regarding these models' generalization capabilities and the extent to which they truly integrate audio and text modalities in their internal representations. In this work, we evaluate four SLMs on the task of speech emotion recognition using a dataset of emotionally incongruent speech samples, a condition under which the semantic content of the spoken utterance conveys one emotion while speech expressiveness conveys another. Our results indicate that SLMs rely predominantly on textual semantics rather than speech emotion to perform the task, indicating that text-related representations largely dominate over acoustic representations. We release both the code and the Emotionally Incongruent Synthetic Speech dataset (EMIS) to the community.

artificial intelligence, emotion, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.25054

Country: South America > Brazil (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.64)

Add feedback

Practical Bias Mitigation through Proxy Sensitive Attribute Label Generation

Chaudhary, Bhushan, Pandey, Anubha, Bhatt, Deepak, Tiwari, Darshika

arXiv.org Artificial IntelligenceDec-26-2023

Machine Learning has attained high success rates in practically Similarly, zip codes can be correlated with race. Hence, every field, including healthcare, finance, and education, the bias gets embedded in the non-sensitive attributes that based on the accuracy and efficiency of the model's are used in the model training. Based on this hypothesis, a outcome (Dastile, Çelik, and Potsane 2020; Bakator and few initial efforts have been made to mitigate bias in the Radosav 2018). However, these models are biased and exhibit absence of protected attributes (Grari, Lamprier, and Detyniecki a propensity to favor one demographic group over another 2022; Lahoti et al. 2020; Yan, Kao, and Ferrara in various applications, including credit and loan approval, 2020; Zhao et al. 2022). The most recent approach (Zhao criminal justice, and resume-based candidate shortlisting et al. 2022) identifies related features that are correlated with (Mehrabi et al. 2021; Gianfrancesco et al. 2018; Yapo the sensitive attributes and would further minimize the correlation and Weiss 2018). The idea of fairness has received a lot of between the related features and the model's prediction attention recently to combat the discrimination from the outcome to learn a fair classifier with respect to the sensitive of ML models (Dwork et al. 2012; Beutel et al. 2017; attribute. However, identification of related features require Hardt, Price, and Srebro 2016).

algorithm, bias mitigation algorithm, information, (11 more...)

arXiv.org Artificial Intelligence

2312.15994

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
(10 more...)

Genre: Research Report (0.64)

Industry:

Law (0.66)
Health & Medicine (0.66)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Granular conditional entropy-based attribute reduction for partially labeled data with proxy labels

Gao, Can, Zhoua, Jie, Miao, Duoqian, Yue, Xiaodong, Wan, Jun

arXiv.org Artificial IntelligenceJan-23-2021

Attribute reduction is one of the most important research topics in the theory of rough sets, and many rough sets-based attribute reduction methods have thus been presented. However, most of them are specifically designed for dealing with either labeled data or unlabeled data, while many real-world applications come in the form of partial supervision. In this paper, we propose a rough sets-based semi-supervised attribute reduction method for partially labeled data. Particularly, with the aid of prior class distribution information about data, we first develop a simple yet effective strategy to produce the proxy labels for unlabeled data. Then the concept of information granularity is integrated into the information-theoretic measure, based on which, a novel granular conditional entropy measure is proposed, and its monotonicity is proved in theory. Furthermore, a fast heuristic algorithm is provided to generate the optimal reduct of partially labeled data, which could accelerate the process of attribute reduction by removing irrelevant examples and excluding redundant attributes simultaneously. Extensive experiments conducted on UCI data sets demonstrate that the proposed semi-supervised attribute reduction method is promising and even compares favourably with the supervised methods on labeled data and unlabeled data with true labels in terms of classification performance.

proxy label, reduction, unlabeled data, (16 more...)

arXiv.org Artificial Intelligence

2101.09495

Country:

Asia > China > Guangdong Province > Shenzhen (0.05)
Asia > China > Shanghai > Shanghai (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(5 more...)

Genre: Research Report (0.70)

Industry: Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Causal Effects of Linguistic Properties

Pryzant, Reid, Card, Dallas, Jurafsky, Dan, Veitch, Victor, Sridhar, Dhanya

arXiv.org Artificial IntelligenceOct-24-2020

We consider the problem of estimating the causal effects of linguistic properties on downstream outcomes. For example, does writing a complaint politely lead to a faster response time? How much will a positive product review increase sales? This paper focuses on two challenges related to the problem. First, we formalize the causal quantity of interest as the effect of a writer's intent, and establish the assumptions necessary to identify this from observational data. Second, in practice we only have access to noisy proxies for these linguistic properties---e.g., predictions from classifiers and lexicons. We propose an estimator for this setting and prove that its bias is bounded when we perform an adjustment for the text. The method leverages (1) a pre-trained language model (BERT) to adjust for the text, and (2) distant supervision to improve the quality of noisy proxies. We show that our algorithm produces better causal estimates than related methods on two datasets: predicting the effect of music review sentiment on sales, and complaint politeness on response time.

machine learning, natural language, proxy, (18 more...)

arXiv.org Artificial Intelligence

2010.12919

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > Texas (0.04)
(5 more...)

Genre:

Research Report > Experimental Study (0.68)
Research Report > Strength High (0.46)
Research Report > New Finding (0.46)

Industry: Government (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Add feedback

An Overview of Deep Semi-Supervised Learning

Ouali, Yassine, Hudelot, Céline, Tami, Myriam

arXiv.org Machine LearningJul-6-2020

Deep neural networks demonstrated their ability to provide remarkable performances on a wide range of supervised learning tasks (e.g., image classification) when trained on extensive collections of labeled data (e.g., ImageNet). However, creating such large datasets requires a considerable amount of resources, time, and effort. Such resources may not be available in many practical cases, limiting the adoption and the application of many deep learning methods. In a search for more data-efficient deep learning methods to overcome the need for large annotated datasets, there is a rising research interest in semi-supervised learning and its applications to deep neural networks to reduce the amount of labeled data required, by either developing novel methods or adopting existing semi-supervised learning frameworks for a deep learning setting. In this paper, we provide a comprehensive overview of deep semi-supervised learning, starting with an introduction to the field, followed by a summarization of the dominant semi-supervised approaches in deep learning.

artificial intelligence, arxiv preprint arxiv, machine learning, (16 more...)

arXiv.org Machine Learning

2006.05278

Country:

North America > United States > Wisconsin > Dane County > Madison (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > Germany > Saxony > Leipzig (0.04)
(2 more...)

Genre:

Overview (1.00)
Workflow (0.93)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)

Add feedback

Semi-Supervised Learning with Self-Supervised Networks

Tran, Phi Vu

arXiv.org Machine LearningJun-25-2019

Recent advances in semi-supervised learning have shown tremendous potential in overcoming a major barrier to the success of modern machine learning algorithms: access to vast amounts of human-labeled training data. Algorithms based on self-ensemble learning and virtual adversarial training can harness the abundance of unlabeled data to produce impressive state-of-the-art results on a number of semi-supervised benchmarks, approaching the performance of strong supervised baselines using only a fraction of the available labeled data. However, these methods often require careful tuning of many hyper-parameters and are usually not easy to implement in practice. In this work, we present a conceptually simple yet effective semi-supervised algorithm based on self-supervised learning to combine semantic feature representations from unlabeled data. Our models are efficiently trained end-to-end for the joint, multi-task learning of labeled and unlabeled data in a single stage. Striving for simplicity and practicality, our approach requires no additional hyper-parameters to tune for optimal performance beyond the standard set for training convolutional neural networks. We conduct a comprehensive empirical evaluation of our models for semi-supervised image classification on SVHN, CIFAR-10 and CIFAR-100, and demonstrate results competitive with, and in some cases exceeding, prior state of the art. Reference code and data are available at https://github.com/vuptran/sesemi.

artificial intelligence, inductive learning, machine learning, (15 more...)

arXiv.org Machine Learning

1906.10343

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

An overview of proxy-label approaches for semi-supervised learning

#artificialintelligenceJan-24-2019, 14:21:08 GMT

This post discusses semi-supervised learning algorithms that learn from proxy labels assigned to unlabelled data. Note: Parts of this post are based on my ACL 2018 paper Strong Baselines for Neural Semi-supervised Learning under Domain Shift with Barbara Plank. Unsupervised learning constitutes one of the main challenges for current machine learning models and one of the key elements that is missing for general artificial intelligence. While unsupervised learning on its own is still elusive, researchers have a made a lot of progress in combining unsupervised learning with supervised learning. This branch of machine learning research is called semi-supervised learning. Semi-supervised learning has a long history. For a (slightly outdated) overview, refer to Zhu (2005) [1] and Chapelle et al. (2006) [2].

artificial intelligence, machine learning, prediction, (16 more...)

#artificialintelligence

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

An Overview of Proxy-label Approaches for Semi-supervised Learning

@machinelearnbotMay-26-2018, 15:50:20 GMT

Note: Parts of this post are based on my ACL 2018 paper Strong Baselines for Neural Semi-supervised Learning under Domain Shift with Barbara Plank. Unsupervised learning constitutes one of the main challenges for current machine learning models and one of the key elements that is missing for general artificial intelligence. While unsupervised learning on its own is still elusive, researchers have a made a lot of progress in combining unsupervised learning with supervised learning. This branch of machine learning research is called semi-supervised learning. Semi-supervised learning has a long history. For a (slightly outdated) overview, refer to Zhu (2005) [1] and Chapelle et al. (2006) [2]. Particularly recently, semi-supervised learning has seen some success, considerably reducing the error rate on important benchmarks.

Add feedback

Filters

Collaborating Authors

proxy label

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Mind the Gaps: Auditing and Reducing Group Inequity in Large-Scale Mobility Prediction

Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech

71a58e8cb75904f24cde464161c3e766-Paper.pdf

Practical Bias Mitigation through Proxy Sensitive Attribute Label Generation

Granular conditional entropy-based attribute reduction for partially labeled data with proxy labels

Causal Effects of Linguistic Properties

An Overview of Deep Semi-Supervised Learning

Semi-Supervised Learning with Self-Supervised Networks

An overview of proxy-label approaches for semi-supervised learning

An Overview of Proxy-label Approaches for Semi-supervised Learning